Assignment_4: Swimming Scholarship Enrollment Clustering¶

image.png

Data Dictionary¶

image.png

Coding:¶

Part 1: K-Means Clustering (Coding) - 35%¶

1. Data Exploration and Preprocessing¶

o Load and inspect the dataset.¶

o Perform Exploratory Data Analysis (EDA) to understand the distribution and patterns.¶

o Separate categorical and numerical features.¶

o Apply One-Hot Encoding to categorical variables.¶

o Standardize/Normalize numerical features for clustering.¶

In [1]:
import pandas as pd
import seaborn as sns
import warnings
# Ignore warnings for cleaner output
warnings.filterwarnings("ignore")
# Step 1: Import Required Libraries
# Importing the necessary Python libraries for data manipulation, visualization, and clustering
import pandas as pd  # For data manipulation and analysis
import numpy as np  # For numerical operations
import matplotlib.pyplot as plt  # For plotting graphs
import seaborn as sns  # For statistical data visualization
from sklearn.cluster import KMeans  # For K-Means clustering
from sklearn.preprocessing import StandardScaler  # For standardizing features
from sklearn.metrics import silhouette_score, silhouette_samples  # For evaluating clustering performance
# Mount Google Drive (specific to Google Colab) to access data files stored on Google Drive
from google.colab import drive
drive.mount('/content/drive')
# Import necessary libraries
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans, DBSCAN
from sklearn.metrics import silhouette_score
from sklearn.cluster import AgglomerativeClustering
import scipy.cluster.hierarchy as sch
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.spatial import Voronoi, voronoi_plot_2d
Mounted at /content/drive
In [2]:
import os

# Define the file path
file_path = '/content/drive/My Drive/Assignment4/swimming_scholarship_dataset_expanded.csv'  # Replace with your file's path
# Check if the file exists
if os.path.exists(file_path):
    print("File exists!")
else:
    print("File does not exist.")
File exists!
In [3]:
df = pd.read_csv(file_path)
In [4]:
df.sample(3)
Out[4]:
Application ID Gender Age State High School GPA Swimming Time 100m (sec) Swimming Time 200m (sec) Swimming Time 400m (sec) Swimming Type Distance Specialization Swim Club Membership Years Competitive Swimming Height (cm) Weight (kg) Academic Interest Parent Support Level
299 APP0300 Female 14 NC 2.10 50.49 121.62 222.14 Butterfly 400m Yes 4 174 80 Biology High
459 APP0460 Male 16 OH 2.91 54.21 132.81 284.41 Freestyle 400m No 1 195 84 Economics Medium
138 APP0139 Female 15 PA 3.91 60.79 135.45 370.12 Breaststroke 100m Yes 2 186 98 Mathematics High
In [5]:
# Check the basic structure of the dataset
print(df.info())
print(df.describe())

# Display the first few rows to understand the data
df.sample(3)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 16 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   Application ID              500 non-null    object 
 1   Gender                      500 non-null    object 
 2   Age                         500 non-null    int64  
 3   State                       500 non-null    object 
 4   High School GPA             500 non-null    float64
 5   Swimming Time 100m (sec)    500 non-null    float64
 6   Swimming Time 200m (sec)    500 non-null    float64
 7   Swimming Time 400m (sec)    500 non-null    float64
 8   Swimming Type               500 non-null    object 
 9   Distance Specialization     500 non-null    object 
 10  Swim Club Membership        500 non-null    object 
 11  Years Competitive Swimming  500 non-null    int64  
 12  Height (cm)                 500 non-null    int64  
 13  Weight (kg)                 500 non-null    int64  
 14  Academic Interest           500 non-null    object 
 15  Parent Support Level        500 non-null    object 
dtypes: float64(4), int64(4), object(8)
memory usage: 62.6+ KB
None
              Age  High School GPA  Swimming Time 100m (sec)  \
count  500.000000       500.000000                500.000000   
mean    15.970000         2.978520                 60.173380   
std      1.425891         0.587528                  5.883978   
min     14.000000         2.010000                 50.130000   
25%     15.000000         2.447500                 54.925000   
50%     16.000000         2.965000                 60.390000   
75%     17.000000         3.492500                 65.347500   
max     18.000000         4.000000                 69.960000   

       Swimming Time 200m (sec)  Swimming Time 400m (sec)  \
count                500.000000                500.000000   
mean                 125.153360                299.176960   
std                   14.514755                 56.941798   
min                  100.250000                200.310000   
25%                  112.012500                251.987500   
50%                  125.935000                300.345000   
75%                  137.040000                346.507500   
max                  149.970000                399.670000   

       Years Competitive Swimming  Height (cm)  Weight (kg)  
count                  500.000000   500.000000   500.000000  
mean                     5.376000   174.844000    72.856000  
std                      2.844434    14.903822    16.470027  
min                      1.000000   150.000000    45.000000  
25%                      3.000000   162.750000    59.000000  
50%                      5.000000   175.000000    73.000000  
75%                      8.000000   188.000000    87.000000  
max                     10.000000   200.000000   100.000000  
Out[5]:
Application ID Gender Age State High School GPA Swimming Time 100m (sec) Swimming Time 200m (sec) Swimming Time 400m (sec) Swimming Type Distance Specialization Swim Club Membership Years Competitive Swimming Height (cm) Weight (kg) Academic Interest Parent Support Level
349 APP0350 Male 18 GA 3.88 59.63 110.76 381.93 Freestyle 100m Yes 10 156 96 Physics High
198 APP0199 Female 18 IL 2.34 51.16 109.59 383.02 Butterfly 200m No 8 171 59 Physics High
90 APP0091 Male 16 NC 2.55 55.44 111.36 322.05 Butterfly 400m No 6 154 65 Economics High
In [6]:
# Check for missing values in each column
print(df.isnull().sum())
Application ID                0
Gender                        0
Age                           0
State                         0
High School GPA               0
Swimming Time 100m (sec)      0
Swimming Time 200m (sec)      0
Swimming Time 400m (sec)      0
Swimming Type                 0
Distance Specialization       0
Swim Club Membership          0
Years Competitive Swimming    0
Height (cm)                   0
Weight (kg)                   0
Academic Interest             0
Parent Support Level          0
dtype: int64
In [7]:
df = df.drop(columns=['Application ID', 'Academic Interest', 'Age', 'Height (cm)', 'Weight (kg)', 'Parent Support Level'])
In [8]:
numerical_columns = df.select_dtypes(include=['number'])
In [9]:
categorical_columns = df.select_dtypes(include=['object', 'category'])
In [10]:
print("Numerical columns:", numerical_columns.columns)
print("Categorical columns:", categorical_columns.columns)
Numerical columns: Index(['High School GPA', 'Swimming Time 100m (sec)',
       'Swimming Time 200m (sec)', 'Swimming Time 400m (sec)',
       'Years Competitive Swimming'],
      dtype='object')
Categorical columns: Index(['Gender', 'State', 'Swimming Type', 'Distance Specialization',
       'Swim Club Membership'],
      dtype='object')
In [ ]:
# Plot count plots for each categorical column with a custom palette and annotations
for column in categorical_columns:
   plt.figure(figsize=(10, 6))  # Increased figure size
   ax = sns.countplot(data=df, x=column, palette='viridis')  # Changed to viridis palette

   # Add count annotations on top of the bars
   for p in ax.patches:
       ax.annotate(f'{int(p.get_height())}',  # Convert to integer
                  (p.get_x() + p.get_width() / 2., p.get_height()),
                  ha='center', va='bottom',     # Changed va to bottom
                  xytext=(0, 5),                # Increased text offset
                  fontsize=10,                  # Added fontsize
                  textcoords='offset points')

   plt.title(f'Distribution of {column}', fontsize=12, pad=15)  # Better title
   plt.xticks(rotation=30)  # Changed rotation angle
   plt.tight_layout()  # Added tight_layout
   plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
In [ ]:
import seaborn as sns
import matplotlib.pyplot as plt

# Plot histograms for each numerical column
numerical_columns = df.select_dtypes(include=['number'])

# Plot histograms for each numerical column with a specific color
for column in numerical_columns:
    plt.figure(figsize=(8, 4))
    sns.histplot(df[column], kde=True, color='skyblue')  # Change color to 'skyblue'
    plt.title(f'Distribution of {column}')
    plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
In [ ]:
import seaborn as sns
import matplotlib.pyplot as plt

# Define numerical features
numerical_features = [ 'High School GPA', 'Swimming Time 100m (sec)',
                    'Swimming Time 200m (sec)', 'Swimming Time 400m (sec)',
                    'Years Competitive Swimming']

# Create figure and grid of subplots with more height
fig, axes = plt.subplots(2, 4, figsize=(20, 10))
axes = axes.flatten()  # Flatten to 1D array for easier indexing

# Define custom colors with better contrast
colors = ['#2E86C1', '#E74C3C', '#27AE60', '#8E44AD',
         '#F39C12']

# Loop through numerical features and plot each in a subplot
for i, feature in enumerate(numerical_features):
   sns.boxplot(y=df[feature], ax=axes[i], color=colors[i])
   axes[i].set_title(f'Boxplot of {feature}', fontsize=12, pad=10)
   axes[i].set_ylabel(feature, fontsize=10)
   # Rotate x-tick labels if needed
   axes[i].tick_params(axis='both', labelsize=9)

plt.tight_layout(pad=3.0)
plt.show()
No description has been provided for this image
In [ ]:
# Correlation heatmap to study relationships between numerical features
correlation_matrix = df[numerical_features].corr()
sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm")
plt.title("Correlation Matrix")
plt.show()
No description has been provided for this image
In [ ]:
# Pairplot to visualize the relationships between features
sns.pairplot(df, vars=numerical_features, hue="Gender")
plt.show()
No description has been provided for this image
In [ ]:
# Scaling the data
scaler = MinMaxScaler()
df_scaled = scaler.fit_transform(df[['High School GPA', 'Swimming Time 100m (sec)',
       'Swimming Time 200m (sec)', 'Swimming Time 400m (sec)',
       'Years Competitive Swimming']])  # CustomerID not included
In [ ]:
# Get correct column names from available columns
categorical_columns = ['Gender','State', 'Swimming Type',
                     'Distance Specialization', 'Swim Club Membership']

# Update get_dummies with exact column names
df = pd.get_dummies(df, columns=categorical_columns, drop_first=True)
In [ ]:
wcss = []
K_range = range(1, 11)  # Testing k values from 1 to 10
for k in K_range:
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(df_scaled)
    wcss.append(kmeans.inertia_)
In [ ]:
# Plotting the Elbow Curve
plt.figure(figsize=(10, 6))
plt.plot(K_range, wcss, marker='o', linestyle='--')
plt.xlabel('Number of Clusters (K)')
plt.ylabel('Within-Cluster Sum of Squares (WCSS)')
plt.title('Elbow Method for Optimal K')
plt.grid(True)
plt.show()
No description has been provided for this image
In [ ]:
# Fit the final K-Means model with the optimal K (e.g., K=4)
kmeans = KMeans(n_clusters=6, random_state=42)  # Initialize KMeans with 4 clusters
kmeans_labels = kmeans.fit_predict(df_scaled)  # Fit KMeans and get cluster labels
df['KM'] = kmeans_labels  # Add KMeans cluster labels as 'KM'  # Add the cluster labels to the original dataframe
In [ ]:
# Step 9: Add Cluster Labels to the Dataset
# Add the cluster labels to the original dataset for further analysis
df['Cluster'] = kmeans.labels_
In [ ]:
# Convert df_scaled to numpy array for plotting
scaled_data = df_scaled.values

# Silhouette Scores for different k values
silhouette_scores = []
k_values = [1, 2, 3, 4, 5, 6]

plt.figure(figsize=(15, 8))

for idx, k in enumerate(k_values):
   kmeans = KMeans(n_clusters=k, random_state=42)
   cluster_labels = kmeans.fit_predict(df_scaled)

   # Plot Silhouette Scores
   if k > 1:
       silhouette_avg = silhouette_score(df_scaled, cluster_labels)
       silhouette_scores.append(silhouette_avg)
       plt.subplot(2, 3, idx + 1)
       plt.title(f'Silhouette Score for k={k}: {silhouette_avg:.2f}')
       plt.scatter(scaled_data[:, 0], scaled_data[:, 1], c=cluster_labels, cmap='viridis', s=50)
   else:
       plt.subplot(2, 3, idx + 1)
       plt.title(f'k={k}')
       plt.scatter(scaled_data[:, 0], scaled_data[:, 1], cmap='viridis', s=50)

plt.tight_layout()
plt.show()
No description has been provided for this image
In [ ]:
# Step 11: Evaluate the Clustering Performance (Silhouette Score)
# Compute the silhouette score, which evaluates how well each point lies within its cluster
silhouette_avg = silhouette_score(df_scaled, kmeans.labels_)
print(f"\nSilhouette Score for K = {k_optimal}: {silhouette_avg:.2f}")
Silhouette Score for K = 6: 0.71
In [ ]:
# Step 11: Evaluate the Clustering Performance (Silhouette Score)
# Compute the silhouette score, which evaluates how well each point lies within its cluster
silhouette_avg = silhouette_score(df_scaled, kmeans.labels_)
print(f"\nSilhouette Score for K = {k_optimal}: {silhouette_avg:.2f}")

# Visualize the Silhouette Scores for Each Sample
from sklearn.metrics import silhouette_samples  # Ensure silhouette_samples is imported

silhouette_values = silhouette_samples(df_scaled, kmeans.labels_)

plt.figure(figsize=(10, 6))
y_lower = 10
for i in range(k_optimal):
    ith_cluster_silhouette_values = silhouette_values[kmeans.labels_ == i]
    ith_cluster_silhouette_values.sort()
    size_cluster_i = ith_cluster_silhouette_values.shape[0]
    y_upper = y_lower + size_cluster_i
    color = sns.color_palette('viridis', k_optimal)[i]
    plt.fill_betweenx(np.arange(y_lower, y_upper), 0, ith_cluster_silhouette_values, facecolor=color, edgecolor=color, alpha=0.7)
    plt.text(-0.05, y_lower + 0.5 * size_cluster_i, str(i))
    y_lower = y_upper + 10  # 10 for the space between clusters

plt.xlabel("Silhouette Coefficient Values")
plt.ylabel("Cluster Label")
plt.title("Silhouette Plot for K-Means Clustering with K = " + str(k_optimal))
plt.axvline(x=silhouette_avg, color="red", linestyle="--")
plt.grid(True)
plt.show()
Silhouette Score for K = 6: 0.71
No description has been provided for this image
In [ ]:
# Step 12: Interpret the Results
# Analyze each cluster based on descriptive statistics and understand the customer segments
print("\nCluster Analysis:")
print(df.groupby('Cluster').agg('mean', numeric_only=True))
Cluster Analysis:
         High School GPA  Swimming Time 100m (sec)  Swimming Time 200m (sec)  \
Cluster                                                                        
0               2.733404                 54.431064                132.746915   
1               2.633765                 65.031412                139.420588   
2               2.628167                 62.093833                111.922833   
3               3.129012                 55.679630                114.462222   
4               3.614388                 60.245612                128.367449   
5               2.964634                 64.667683                118.059756   

         Swimming Time 400m (sec)  Years Competitive Swimming  Gender_Male  \
Cluster                                                                      
0                      260.046277                    6.500000     0.521277   
1                      326.796471                    4.670588     0.576471   
2                      332.704167                    2.716667     0.450000   
3                      343.921235                    7.691358     0.419753   
4                      289.284592                    2.755102     0.448980   
5                      258.495976                    7.609756     0.548780   

         State_FL  State_GA  State_IL  State_MI  ...  State_PA  State_TX  \
Cluster                                          ...                       
0        0.031915  0.138298  0.106383  0.095745  ...  0.095745  0.117021   
1        0.152941  0.070588  0.082353  0.129412  ...  0.105882  0.082353   
2        0.150000  0.116667  0.083333  0.100000  ...  0.150000  0.033333   
3        0.074074  0.123457  0.111111  0.160494  ...  0.086420  0.061728   
4        0.071429  0.122449  0.163265  0.102041  ...  0.071429  0.091837   
5        0.036585  0.048780  0.121951  0.097561  ...  0.109756  0.134146   

         Swimming Type_Butterfly  Swimming Type_Freestyle  \
Cluster                                                     
0                       0.393617                 0.329787   
1                       0.235294                 0.317647   
2                       0.233333                 0.383333   
3                       0.283951                 0.308642   
4                       0.346939                 0.316327   
5                       0.365854                 0.365854   

         Distance Specialization_200m  Distance Specialization_400m  \
Cluster                                                               
0                            0.457447                      0.308511   
1                            0.223529                      0.376471   
2                            0.283333                      0.366667   
3                            0.308642                      0.333333   
4                            0.357143                      0.295918   
5                            0.341463                      0.378049   

         Swim Club Membership_Yes   KM    DBSCAN  Hierarchical  
Cluster                                                         
0                        0.404255  0.0  3.000000      1.010638  
1                        0.611765  1.0  2.000000      2.117647  
2                        0.533333  2.0 -0.016667      1.666667  
3                        0.419753  3.0  5.000000      1.802469  
4                        0.551020  4.0  1.000000      0.112245  
5                        0.512195  5.0  4.000000      0.560976  

[6 rows x 23 columns]
In [ ]:
# Step 15: Visualize Cluster Counts
# Use a count plot to visualize the number of data points in each cluster
plt.figure(figsize=(10, 6))
sns.countplot(x='Cluster', data=df, palette='viridis')
plt.xlabel('Cluster')
plt.ylabel('Number of Data Points')
plt.title('Count of Data Points in Each Cluster')
plt.grid(True)
plt.show()
No description has been provided for this image
In [ ]:
# Step 16: Pairplot for Cluster Analysis
plt.figure(figsize=(20, 15))
sns.pairplot(df[['High School GPA', 'Swimming Time 100m (sec)',
                'Swimming Time 200m (sec)', 'Swimming Time 400m (sec)',
                'Years Competitive Swimming', 'Cluster']],
           hue='Cluster',
           palette='viridis',
           plot_kws={'alpha': 0.6, 's': 80},
           diag_kind='kde')  # Use KDE plots on diagonal for better distribution visualization
plt.suptitle('Pairplot of student Performance Metrics by Cluster', y=1.02, fontsize=16)
plt.tight_layout()
plt.show()
<Figure size 2000x1500 with 0 Axes>
No description has been provided for this image
In [ ]:
# Convert numpy array back to DataFrame with original column names
df_scaled = pd.DataFrame(df_scaled, columns=numerical_columns.columns)
df_scaled['KM'] = kmeans_labels  # Add cluster labels

# Create heatmap
kmeans_feature_means = df_scaled.groupby('KM').mean()
plt.figure(figsize=(10, 6))
sns.heatmap(kmeans_feature_means, annot=True, cmap='coolwarm')
plt.title('Heatmap of Feature Means by KMeans Cluster')
plt.show()
No description has been provided for this image
In [ ]:
feature_combinations = [
   ('High School GPA', 'Swimming Time 100m (sec)'),
   ('High School GPA', 'Swimming Time 200m (sec)'),
   ('High School GPA', 'Swimming Time 400m (sec)'),
   ('High School GPA', 'Years Competitive Swimming')
]

fig, axes = plt.subplots(2, 2, figsize=(18, 12))
axes = axes.flatten()
fig.suptitle('Students Performance Feature Combinations', fontsize=16)

for i, (x_feature, y_feature) in enumerate(feature_combinations):
   ax = axes[i]
   sns.scatterplot(x=x_feature, y=y_feature, hue='KM', palette='viridis', data=df, s=100, alpha=0.7, ax=ax)
   ax.set_xlabel(x_feature)
   ax.set_ylabel(y_feature)
   ax.legend(title='Cluster')
   ax.grid(True)

plt.tight_layout(rect=[0, 0, 1, 0.95])
plt.show()
No description has been provided for this image
In [ ]:
# Step 22.4: Hyperparameter Tuning of K-Means
# Use the silhouette score to guide hyperparameter tuning with different values of 'n_init' and 'max_iter'
kmeans_tuned = KMeans(n_clusters=k_optimal, n_init=20, max_iter=500, random_state=42)
kmeans_tuned.fit(df_scaled)
silhouette_avg_tuned = silhouette_score(df_scaled, kmeans_tuned.labels_)
print(f"Silhouette Score with Tuned K-Means for K = {k_optimal}: {silhouette_avg_tuned:.2f}")
Silhouette Score with Tuned K-Means for K = 6: 0.71
In [ ]:
# Step 22.5: Use PCA for Dimensionality Reduction before Clustering
# Reduce the dataset to fewer dimensions using PCA and then apply K-Means
from sklearn.decomposition import PCA  # For dimensionality reduction and visualizing high-dimensional data  # For evaluating clustering performance
pca = PCA(n_components=2)
X_pca = pca.fit_transform(df_scaled)
kmeans_pca = KMeans(n_clusters=k_optimal, random_state=42)
kmeans_pca.fit(X_pca)
silhouette_avg_pca = silhouette_score(X_pca, kmeans_pca.labels_)
print(f"Silhouette Score with PCA for K = {k_optimal}: {silhouette_avg_pca:.2f}")
Silhouette Score with PCA for K = 6: 0.93
In [ ]:
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import matplotlib.pyplot as plt

# Step 1: Calculate WCSS and Silhouette Scores for PCA data
wcss = []  # Within-cluster sum of squares
silhouette_scores = []  # Silhouette scores for each K

# Test K values from 2 to 10
K_range = range(2, 11)
for k in K_range:
    kmeans = KMeans(n_clusters=k, random_state=42)  # Initialize KMeans with k clusters
    kmeans.fit(df_scaled)  # Fit KMeans to PCA-transformed data
    wcss.append(kmeans.inertia_)  # Append WCSS (inertia)
    silhouette_scores.append(silhouette_score(df_scaled, kmeans.labels_))  # Append silhouette score

# Step 2: Plot the Elbow Method
plt.figure(figsize=(10, 6))
plt.plot(K_range, wcss, marker='o', linestyle='--', color='b', label='WCSS (Elbow)')
plt.xlabel('Number of Clusters (K)')
plt.ylabel('WCSS')
plt.title('Elbow Plot for PCA Data')
plt.legend()
plt.show()

# Step 3: Plot Silhouette Scores
plt.figure(figsize=(10, 6))
plt.plot(K_range, silhouette_scores, marker='o', linestyle='--', color='g', label='Silhouette Score')
plt.xlabel('Number of Clusters (K)')
plt.ylabel('Silhouette Score')
plt.title('Silhouette Scores for PCA Data')
plt.legend()
plt.show()
No description has been provided for this image
No description has been provided for this image
In [ ]:
# Step 1: Choose optimal K (e.g., from elbow or silhouette plot)
optimal_k = 6  # Replace with the chosen value of K

# Step 2: Fit KMeans on PCA-transformed data
kmeans = KMeans(n_clusters=optimal_k, random_state=42)
df_scaled['Cluster'] = kmeans.fit_predict(df_scaled)  # Add cluster labels to the PCA DataFrame

# Step 3: Add cluster labels to original DataFrame for analysis
df_scaled['Cluster_KMeans'] = df_scaled['Cluster']
In [ ]:
df_scaled.sample(3)
Out[ ]:
High School GPA Swimming Time 100m (sec) Swimming Time 200m (sec) Swimming Time 400m (sec) Years Competitive Swimming KM Cluster Cluster_KMeans
473 0.150754 0.871407 0.540225 0.274729 0.777778 5 4 4
102 0.683417 0.806858 0.377313 0.271268 0.222222 4 1 1
424 0.341709 0.177005 0.037208 0.822281 0.333333 2 2 2
In [ ]:
# Step 1: Calculate mean of original features grouped by PCA-based clusters
cluster_means = df_scaled.groupby('Cluster_KMeans').mean()

# Step 2: Plot heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(cluster_means, annot=True, cmap='coolwarm', fmt='.2f')  # Create heatmap with annotations
plt.title('Heatmap of Feature Means by Cluster (K-Means)')
plt.show()
No description has been provided for this image
In [ ]:
# Step 22.3: Use DBSCAN Clustering
# Use DBSCAN, which is a density-based clustering technique that can help improve clustering in non-spherical datasets.
from sklearn.cluster import DBSCAN

dbscan = DBSCAN(eps=0.5, min_samples=5)
dbscan_labels = dbscan.fit_predict(df_scaled)
In [ ]:
# Evaluate DBSCAN Performance
silhouette_avg_dbscan = silhouette_score(df_scaled, dbscan_labels) if len(set(dbscan_labels)) > 1 else -1
print(f"Silhouette Score for DBSCAN: {silhouette_avg_dbscan:.2f}")
Silhouette Score for DBSCAN: 0.63
In [ ]:
# Fitting DBSCAN
dbscan = DBSCAN()  # Initialize DBSCAN with default parameters
dbscan_labels = dbscan.fit_predict(df_scaled)  # Fit DBSCAN and get cluster labels
df['DBSCAN'] = dbscan_labels  # Add DBSCAN cluster labels to the original dataframe
In [ ]:
# Silhouette scores for different epsilon values
eps_values = [0.5]  # List of epsilon values to evaluate
silhouette_dbscan = []  # List to store silhouette scores

for eps in eps_values:
    dbscan = DBSCAN(eps=eps)  # Initialize DBSCAN with a specific epsilon
    labels = dbscan.fit_predict(df_scaled)  # Fit DBSCAN and get labels
    if len(set(labels)) > 1:  # Check if more than one cluster is formed
        silhouette_avg = silhouette_score(df_scaled, labels)  # Calculate silhouette score
        silhouette_dbscan.append((eps, silhouette_avg))  # Append the epsilon and silhouette score to the list
In [ ]:
# Printing silhouette scores
for eps, score in silhouette_dbscan:
    print(f'Epsilon: {eps}, Silhouette Score: {score}')
Epsilon: 0.5, Silhouette Score: 0.6253645793716519
In [ ]:
# Plotting silhouette scores and clusters for DBSCAN
eps_values = [0.5]
plt.figure(figsize=(15, 15))

for idx, eps in enumerate(eps_values):
    dbscan = DBSCAN(eps=eps)
    labels = dbscan.fit_predict(df_scaled)

    if len(set(labels)) > 1:
        silhouette_avg = silhouette_score(df_scaled, labels)
        plt.subplot(3, 2, idx + 1)
        plt.title(f'Epsilon: {eps}, Silhouette Score: {silhouette_avg:.2f}')
        # Convert df_scaled_dbscan to numpy array and select first two columns
        data = df_scaled.to_numpy()
        plt.scatter(data[:, 0], data[:, 1], c=labels, cmap='viridis', s=50)

plt.tight_layout()
plt.show()
No description has been provided for this image
In [ ]:
# Convert df_scaled_dbscan back to DataFrame with column names
df_scaled_dbscan = pd.DataFrame(df_scaled, columns=['High School GPA', 'Swimming Time 100m (sec)',
                                                         'Swimming Time 200m (sec)', 'Swimming Time 400m (sec)',
                                                         'Years Competitive Swimming'])
df_scaled_dbscan['DBSCAN'] = dbscan.labels_

# Create heatmap
dbscan_feature_means = df_scaled_dbscan.groupby('DBSCAN').mean()
plt.figure(figsize=(10, 6))
sns.heatmap(dbscan_feature_means, annot=True, cmap='coolwarm')
plt.title('Heatmap of Feature Means by DBSCAN Cluster')
plt.show()
No description has been provided for this image
In [ ]:
# Create boxplots for all numerical features
features_to_plot = ['Years Competitive Swimming', 'High School GPA',
                  'Swimming Time 100m (sec)', 'Swimming Time 200m (sec)',
                  'Swimming Time 400m (sec)']

fig, axes = plt.subplots(2, 4, figsize=(20, 12))
axes = axes.flatten()

for i, feature in enumerate(features_to_plot):
   sns.boxplot(x='DBSCAN', y=feature, data=df, palette='viridis', ax=axes[i])
   axes[i].set_title(f'Boxplot of {feature} by Cluster')
   axes[i].set_xlabel('Cluster')
   axes[i].set_ylabel(feature)

plt.tight_layout()
plt.show()
No description has been provided for this image
In [ ]:
# Cluster Density Plot with correct feature names
plt.figure(figsize=(10, 6))
sns.scatterplot(x='High School GPA', y='Swimming Time 100m (sec)',
               hue='DBSCAN', data=df, palette='viridis')
plt.title('DBSCAN Cluster Density Plot')
plt.show()
No description has been provided for this image
In [ ]:
# Cluster Density Plot with correct feature names
plt.figure(figsize=(10, 6))
sns.scatterplot(x='High School GPA', y='Swimming Time 200m (sec)',
               hue='DBSCAN', data=df, palette='viridis')
plt.title('DBSCAN Cluster Density Plot')
plt.show()
No description has been provided for this image
In [ ]:
# Cluster Density Plot with correct feature names
plt.figure(figsize=(10, 6))
sns.scatterplot(x='High School GPA', y='Swimming Time 400m (sec)',
               hue='DBSCAN', data=df, palette='viridis')
plt.title('DBSCAN Cluster Density Plot')
plt.show()
No description has been provided for this image
In [ ]:
# Cluster Density Plot with correct feature names
plt.figure(figsize=(10, 6))
sns.scatterplot(x='Years Competitive Swimming', y= 'High School GPA',
               hue='DBSCAN', data=df, palette='viridis')
plt.title('DBSCAN Cluster Density Plot')
plt.show()
No description has been provided for this image
In [ ]:
# Scaling the data for hierarchical clustering
# Scaling the data for hierarchical clustering
scaler = MinMaxScaler()
df_scaled_hc = scaler.fit_transform(df[['High School GPA',
                                     'Swimming Time 100m (sec)',
                                     'Swimming Time 200m (sec)',
                                     'Swimming Time 400m (sec)',
                                     'Years Competitive Swimming']])
In [ ]:
# Plotting the dendrogram
plt.figure(figsize=(18, 7))
dendrogram = sch.dendrogram(sch.linkage(df_scaled_hc, method='ward'))  # Create dendrogram using Ward's linkage
plt.title('Dendrogram for Hierarchical Clustering')
plt.xlabel('students')
plt.ylabel('Euclidean Distances')
plt.show()
No description has been provided for this image
In [ ]:
# Fitting the hierarchical clustering to the dataset
hc = AgglomerativeClustering(n_clusters=4, metric='euclidean', linkage='ward')  # Initialize Agglomerative Clustering with 4 clusters
hc_labels = hc.fit_predict(df_scaled_hc)  # Fit the model and get cluster labels
df['Hierarchical'] = hc_labels  # Add the cluster labels to the original dataframe
In [ ]:
# Silhouette score for Hierarchical clustering
silhouette_hc = silhouette_score(df_scaled_hc, hc_labels)  # Calculate silhouette score for hierarchical clustering
print(f'Silhouette Score for Hierarchical Clustering: {silhouette_hc}')  # Print silhouette score
Silhouette Score for Hierarchical Clustering: 0.11617031200512948
In [ ]:
# Heatmap for Hierarchical clustering with improved visualization
hierarchical_feature_means = df.groupby('Hierarchical').mean()

plt.figure(figsize=(12, 8))
sns.heatmap(hierarchical_feature_means[['High School GPA',
                                     'Swimming Time 100m (sec)',
                                     'Swimming Time 200m (sec)',
                                     'Swimming Time 400m (sec)',
                                     'Years Competitive Swimming']],
           annot=True,
           cmap='coolwarm',
           fmt='.2f',
           cbar_kws={'label': 'Mean Value'},
           yticklabels=['Cluster ' + str(x) for x in range(len(hierarchical_feature_means))])

plt.title('Feature Means by Hierarchical Cluster', pad=20, fontsize=14)
plt.ylabel('Clusters')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
No description has been provided for this image
In [ ]:
# Heatmap for Hierarchical clustering with all features
hierarchical_feature_means = df.groupby('Hierarchical').mean()

# Create heatmap with all features
plt.figure(figsize=(16, 10))
sns.heatmap(hierarchical_feature_means,
            annot=True,
            cmap='coolwarm',
            fmt='.2f',
            cbar_kws={'label': 'Mean Value'},
            yticklabels=['Cluster ' + str(x) for x in range(len(hierarchical_feature_means))])

plt.title('Feature Means by Hierarchical Cluster', pad=20, fontsize=14)
plt.ylabel('Clusters')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
No description has been provided for this image
In [ ]:
# Create boxplots for selected features
features_to_plot = ['High School GPA', 'Swimming Time 100m (sec)', 'Swimming Time 200m (sec)',
                  'Swimming Time 400m (sec)', 'Years Competitive Swimming']

fig, axes = plt.subplots(1, 5, figsize=(20, 6))

for i, feature in enumerate(features_to_plot):
   sns.boxplot(x='Hierarchical', y=feature, data=df, palette='viridis', ax=axes[i])
   axes[i].set_title(f'Boxplot of {feature} by Cluster')
   axes[i].set_xlabel('Cluster')
   axes[i].set_ylabel(feature)

plt.tight_layout()
plt.show()
No description has been provided for this image
In [ ]:
!jupyter nbconvert --to html '/content/drive/My Drive/Assignment1/Assignment_1_Ndumnwere_Ezinne.ipynb'
[NbConvertApp] Converting notebook /content/drive/My Drive/Assignment1/Assignment_1_Ndumnwere_Ezinne.ipynb to html
[NbConvertApp] Writing 1376712 bytes to /content/drive/My Drive/Assignment1/Assignment_1_Ndumnwere_Ezinne.html
In [ ]:
!apt-get install -y pandoc
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  libcmark-gfm-extensions0.29.0.gfm.3 libcmark-gfm0.29.0.gfm.3 pandoc-data
Suggested packages:
  texlive-latex-recommended texlive-xetex texlive-luatex pandoc-citeproc texlive-latex-extra
  context wkhtmltopdf librsvg2-bin groff ghc nodejs php python ruby libjs-mathjax libjs-katex
  citation-style-language-styles
The following NEW packages will be installed:
  libcmark-gfm-extensions0.29.0.gfm.3 libcmark-gfm0.29.0.gfm.3 pandoc pandoc-data
0 upgraded, 4 newly installed, 0 to remove and 49 not upgraded.
Need to get 20.6 MB of archives.
After this operation, 156 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libcmark-gfm0.29.0.gfm.3 amd64 0.29.0.gfm.3-3 [115 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libcmark-gfm-extensions0.29.0.gfm.3 amd64 0.29.0.gfm.3-3 [25.1 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy/universe amd64 pandoc-data all 2.9.2.1-3ubuntu2 [81.8 kB]
Get:4 http://archive.ubuntu.com/ubuntu jammy/universe amd64 pandoc amd64 2.9.2.1-3ubuntu2 [20.3 MB]
Fetched 20.6 MB in 2s (12.3 MB/s)
Selecting previously unselected package libcmark-gfm0.29.0.gfm.3:amd64.
(Reading database ... 123614 files and directories currently installed.)
Preparing to unpack .../libcmark-gfm0.29.0.gfm.3_0.29.0.gfm.3-3_amd64.deb ...
Unpacking libcmark-gfm0.29.0.gfm.3:amd64 (0.29.0.gfm.3-3) ...
Selecting previously unselected package libcmark-gfm-extensions0.29.0.gfm.3:amd64.
Preparing to unpack .../libcmark-gfm-extensions0.29.0.gfm.3_0.29.0.gfm.3-3_amd64.deb ...
Unpacking libcmark-gfm-extensions0.29.0.gfm.3:amd64 (0.29.0.gfm.3-3) ...
Selecting previously unselected package pandoc-data.
Preparing to unpack .../pandoc-data_2.9.2.1-3ubuntu2_all.deb ...
Unpacking pandoc-data (2.9.2.1-3ubuntu2) ...
Selecting previously unselected package pandoc.
Preparing to unpack .../pandoc_2.9.2.1-3ubuntu2_amd64.deb ...
Unpacking pandoc (2.9.2.1-3ubuntu2) ...
Setting up libcmark-gfm0.29.0.gfm.3:amd64 (0.29.0.gfm.3-3) ...
Setting up libcmark-gfm-extensions0.29.0.gfm.3:amd64 (0.29.0.gfm.3-3) ...
Setting up pandoc-data (2.9.2.1-3ubuntu2) ...
Setting up pandoc (2.9.2.1-3ubuntu2) ...
Processing triggers for man-db (2.10.2-1) ...
Processing triggers for libc-bin (2.35-0ubuntu3.4) ...
/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_5.so.3 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libur_loader.so.0 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbb.so.12 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_0.so.3 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc.so.2 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc_proxy.so.2 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libur_adapter_opencl.so.0 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbbind.so.3 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libur_adapter_level_zero.so.0 is not a symbolic link

In [ ]:
!apt-get install texlive-xetex texlive-fonts-recommended texlive-plain-generic
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  dvisvgm fonts-droid-fallback fonts-lato fonts-lmodern fonts-noto-mono fonts-texgyre
  fonts-urw-base35 libapache-pom-java libcommons-logging-java libcommons-parent-java
  libfontbox-java libfontenc1 libgs9 libgs9-common libidn12 libijs-0.35 libjbig2dec0 libkpathsea6
  libpdfbox-java libptexenc1 libruby3.0 libsynctex2 libteckit0 libtexlua53 libtexluajit2 libwoff1
  libzzip-0-13 lmodern poppler-data preview-latex-style rake ruby ruby-net-telnet ruby-rubygems
  ruby-webrick ruby-xmlrpc ruby3.0 rubygems-integration t1utils teckit tex-common tex-gyre
  texlive-base texlive-binaries texlive-latex-base texlive-latex-extra texlive-latex-recommended
  texlive-pictures tipa xfonts-encodings xfonts-utils
Suggested packages:
  fonts-noto fonts-freefont-otf | fonts-freefont-ttf libavalon-framework-java
  libcommons-logging-java-doc libexcalibur-logkit-java liblog4j1.2-java poppler-utils ghostscript
  fonts-japanese-mincho | fonts-ipafont-mincho fonts-japanese-gothic | fonts-ipafont-gothic
  fonts-arphic-ukai fonts-arphic-uming fonts-nanum ri ruby-dev bundler debhelper gv
  | postscript-viewer perl-tk xpdf | pdf-viewer xzdec texlive-fonts-recommended-doc
  texlive-latex-base-doc python3-pygments icc-profiles libfile-which-perl
  libspreadsheet-parseexcel-perl texlive-latex-extra-doc texlive-latex-recommended-doc
  texlive-luatex texlive-pstricks dot2tex prerex texlive-pictures-doc vprerex default-jre-headless
  tipa-doc
The following NEW packages will be installed:
  dvisvgm fonts-droid-fallback fonts-lato fonts-lmodern fonts-noto-mono fonts-texgyre
  fonts-urw-base35 libapache-pom-java libcommons-logging-java libcommons-parent-java
  libfontbox-java libfontenc1 libgs9 libgs9-common libidn12 libijs-0.35 libjbig2dec0 libkpathsea6
  libpdfbox-java libptexenc1 libruby3.0 libsynctex2 libteckit0 libtexlua53 libtexluajit2 libwoff1
  libzzip-0-13 lmodern poppler-data preview-latex-style rake ruby ruby-net-telnet ruby-rubygems
  ruby-webrick ruby-xmlrpc ruby3.0 rubygems-integration t1utils teckit tex-common tex-gyre
  texlive-base texlive-binaries texlive-fonts-recommended texlive-latex-base texlive-latex-extra
  texlive-latex-recommended texlive-pictures texlive-plain-generic texlive-xetex tipa
  xfonts-encodings xfonts-utils
0 upgraded, 54 newly installed, 0 to remove and 49 not upgraded.
Need to get 182 MB of archives.
After this operation, 571 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/main amd64 fonts-droid-fallback all 1:6.0.1r16-1.1build1 [1,805 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/main amd64 fonts-lato all 2.0-2.1 [2,696 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy/main amd64 poppler-data all 0.4.11-1 [2,171 kB]
Get:4 http://archive.ubuntu.com/ubuntu jammy/universe amd64 tex-common all 6.17 [33.7 kB]
Get:5 http://archive.ubuntu.com/ubuntu jammy/main amd64 fonts-urw-base35 all 20200910-1 [6,367 kB]
Get:6 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libgs9-common all 9.55.0~dfsg1-0ubuntu5.9 [752 kB]
Get:7 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libidn12 amd64 1.38-4ubuntu1 [60.0 kB]
Get:8 http://archive.ubuntu.com/ubuntu jammy/main amd64 libijs-0.35 amd64 0.35-15build2 [16.5 kB]
Get:9 http://archive.ubuntu.com/ubuntu jammy/main amd64 libjbig2dec0 amd64 0.19-3build2 [64.7 kB]
Get:10 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libgs9 amd64 9.55.0~dfsg1-0ubuntu5.9 [5,033 kB]
Get:11 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libkpathsea6 amd64 2021.20210626.59705-1ubuntu0.2 [60.4 kB]
Get:12 http://archive.ubuntu.com/ubuntu jammy/main amd64 libwoff1 amd64 1.0.2-1build4 [45.2 kB]
Get:13 http://archive.ubuntu.com/ubuntu jammy/universe amd64 dvisvgm amd64 2.13.1-1 [1,221 kB]
Get:14 http://archive.ubuntu.com/ubuntu jammy/universe amd64 fonts-lmodern all 2.004.5-6.1 [4,532 kB]
Get:15 http://archive.ubuntu.com/ubuntu jammy/main amd64 fonts-noto-mono all 20201225-1build1 [397 kB]
Get:16 http://archive.ubuntu.com/ubuntu jammy/universe amd64 fonts-texgyre all 20180621-3.1 [10.2 MB]
Get:17 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libapache-pom-java all 18-1 [4,720 B]
Get:18 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libcommons-parent-java all 43-1 [10.8 kB]
Get:19 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libcommons-logging-java all 1.2-2 [60.3 kB]
Get:20 http://archive.ubuntu.com/ubuntu jammy/main amd64 libfontenc1 amd64 1:1.1.4-1build3 [14.7 kB]
Get:21 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libptexenc1 amd64 2021.20210626.59705-1ubuntu0.2 [39.1 kB]
Get:22 http://archive.ubuntu.com/ubuntu jammy/main amd64 rubygems-integration all 1.18 [5,336 B]
Get:23 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 ruby3.0 amd64 3.0.2-7ubuntu2.7 [50.1 kB]
Get:24 http://archive.ubuntu.com/ubuntu jammy/main amd64 ruby-rubygems all 3.3.5-2 [228 kB]
Get:25 http://archive.ubuntu.com/ubuntu jammy/main amd64 ruby amd64 1:3.0~exp1 [5,100 B]
Get:26 http://archive.ubuntu.com/ubuntu jammy/main amd64 rake all 13.0.6-2 [61.7 kB]
Get:27 http://archive.ubuntu.com/ubuntu jammy/main amd64 ruby-net-telnet all 0.1.1-2 [12.6 kB]
Get:28 http://archive.ubuntu.com/ubuntu jammy/universe amd64 ruby-webrick all 1.7.0-3 [51.8 kB]
Get:29 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 ruby-xmlrpc all 0.3.2-1ubuntu0.1 [24.9 kB]
Get:30 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libruby3.0 amd64 3.0.2-7ubuntu2.7 [5,113 kB]
Get:31 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libsynctex2 amd64 2021.20210626.59705-1ubuntu0.2 [55.6 kB]
Get:32 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libteckit0 amd64 2.5.11+ds1-1 [421 kB]
Get:33 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libtexlua53 amd64 2021.20210626.59705-1ubuntu0.2 [120 kB]
Get:34 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libtexluajit2 amd64 2021.20210626.59705-1ubuntu0.2 [267 kB]
Get:35 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libzzip-0-13 amd64 0.13.72+dfsg.1-1.1 [27.0 kB]
Get:36 http://archive.ubuntu.com/ubuntu jammy/main amd64 xfonts-encodings all 1:1.0.5-0ubuntu2 [578 kB]
Get:37 http://archive.ubuntu.com/ubuntu jammy/main amd64 xfonts-utils amd64 1:7.7+6build2 [94.6 kB]
Get:38 http://archive.ubuntu.com/ubuntu jammy/universe amd64 lmodern all 2.004.5-6.1 [9,471 kB]
Get:39 http://archive.ubuntu.com/ubuntu jammy/universe amd64 preview-latex-style all 12.2-1ubuntu1 [185 kB]
Get:40 http://archive.ubuntu.com/ubuntu jammy/main amd64 t1utils amd64 1.41-4build2 [61.3 kB]
Get:41 http://archive.ubuntu.com/ubuntu jammy/universe amd64 teckit amd64 2.5.11+ds1-1 [699 kB]
Get:42 http://archive.ubuntu.com/ubuntu jammy/universe amd64 tex-gyre all 20180621-3.1 [6,209 kB]
Get:43 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 texlive-binaries amd64 2021.20210626.59705-1ubuntu0.2 [9,860 kB]
Get:44 http://archive.ubuntu.com/ubuntu jammy/universe amd64 texlive-base all 2021.20220204-1 [21.0 MB]
Get:45 http://archive.ubuntu.com/ubuntu jammy/universe amd64 texlive-fonts-recommended all 2021.20220204-1 [4,972 kB]
Get:46 http://archive.ubuntu.com/ubuntu jammy/universe amd64 texlive-latex-base all 2021.20220204-1 [1,128 kB]
Get:47 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libfontbox-java all 1:1.8.16-2 [207 kB]
Get:48 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libpdfbox-java all 1:1.8.16-2 [5,199 kB]
Get:49 http://archive.ubuntu.com/ubuntu jammy/universe amd64 texlive-latex-recommended all 2021.20220204-1 [14.4 MB]
Get:50 http://archive.ubuntu.com/ubuntu jammy/universe amd64 texlive-pictures all 2021.20220204-1 [8,720 kB]
Get:51 http://archive.ubuntu.com/ubuntu jammy/universe amd64 texlive-latex-extra all 2021.20220204-1 [13.9 MB]
Get:52 http://archive.ubuntu.com/ubuntu jammy/universe amd64 texlive-plain-generic all 2021.20220204-1 [27.5 MB]
Get:53 http://archive.ubuntu.com/ubuntu jammy/universe amd64 tipa all 2:1.3-21 [2,967 kB]
Get:54 http://archive.ubuntu.com/ubuntu jammy/universe amd64 texlive-xetex all 2021.20220204-1 [12.4 MB]
Fetched 182 MB in 3s (52.0 MB/s)
Extracting templates from packages: 100%
Preconfiguring packages ...
Selecting previously unselected package fonts-droid-fallback.
(Reading database ... 123839 files and directories currently installed.)
Preparing to unpack .../00-fonts-droid-fallback_1%3a6.0.1r16-1.1build1_all.deb ...
Unpacking fonts-droid-fallback (1:6.0.1r16-1.1build1) ...
Selecting previously unselected package fonts-lato.
Preparing to unpack .../01-fonts-lato_2.0-2.1_all.deb ...
Unpacking fonts-lato (2.0-2.1) ...
Selecting previously unselected package poppler-data.
Preparing to unpack .../02-poppler-data_0.4.11-1_all.deb ...
Unpacking poppler-data (0.4.11-1) ...
Selecting previously unselected package tex-common.
Preparing to unpack .../03-tex-common_6.17_all.deb ...
Unpacking tex-common (6.17) ...
Selecting previously unselected package fonts-urw-base35.
Preparing to unpack .../04-fonts-urw-base35_20200910-1_all.deb ...
Unpacking fonts-urw-base35 (20200910-1) ...
Selecting previously unselected package libgs9-common.
Preparing to unpack .../05-libgs9-common_9.55.0~dfsg1-0ubuntu5.9_all.deb ...
Unpacking libgs9-common (9.55.0~dfsg1-0ubuntu5.9) ...
Selecting previously unselected package libidn12:amd64.
Preparing to unpack .../06-libidn12_1.38-4ubuntu1_amd64.deb ...
Unpacking libidn12:amd64 (1.38-4ubuntu1) ...
Selecting previously unselected package libijs-0.35:amd64.
Preparing to unpack .../07-libijs-0.35_0.35-15build2_amd64.deb ...
Unpacking libijs-0.35:amd64 (0.35-15build2) ...
Selecting previously unselected package libjbig2dec0:amd64.
Preparing to unpack .../08-libjbig2dec0_0.19-3build2_amd64.deb ...
Unpacking libjbig2dec0:amd64 (0.19-3build2) ...
Selecting previously unselected package libgs9:amd64.
Preparing to unpack .../09-libgs9_9.55.0~dfsg1-0ubuntu5.9_amd64.deb ...
Unpacking libgs9:amd64 (9.55.0~dfsg1-0ubuntu5.9) ...
Selecting previously unselected package libkpathsea6:amd64.
Preparing to unpack .../10-libkpathsea6_2021.20210626.59705-1ubuntu0.2_amd64.deb ...
Unpacking libkpathsea6:amd64 (2021.20210626.59705-1ubuntu0.2) ...
Selecting previously unselected package libwoff1:amd64.
Preparing to unpack .../11-libwoff1_1.0.2-1build4_amd64.deb ...
Unpacking libwoff1:amd64 (1.0.2-1build4) ...
Selecting previously unselected package dvisvgm.
Preparing to unpack .../12-dvisvgm_2.13.1-1_amd64.deb ...
Unpacking dvisvgm (2.13.1-1) ...
Selecting previously unselected package fonts-lmodern.
Preparing to unpack .../13-fonts-lmodern_2.004.5-6.1_all.deb ...
Unpacking fonts-lmodern (2.004.5-6.1) ...
Selecting previously unselected package fonts-noto-mono.
Preparing to unpack .../14-fonts-noto-mono_20201225-1build1_all.deb ...
Unpacking fonts-noto-mono (20201225-1build1) ...
Selecting previously unselected package fonts-texgyre.
Preparing to unpack .../15-fonts-texgyre_20180621-3.1_all.deb ...
Unpacking fonts-texgyre (20180621-3.1) ...
Selecting previously unselected package libapache-pom-java.
Preparing to unpack .../16-libapache-pom-java_18-1_all.deb ...
Unpacking libapache-pom-java (18-1) ...
Selecting previously unselected package libcommons-parent-java.
Preparing to unpack .../17-libcommons-parent-java_43-1_all.deb ...
Unpacking libcommons-parent-java (43-1) ...
Selecting previously unselected package libcommons-logging-java.
Preparing to unpack .../18-libcommons-logging-java_1.2-2_all.deb ...
Unpacking libcommons-logging-java (1.2-2) ...
Selecting previously unselected package libfontenc1:amd64.
Preparing to unpack .../19-libfontenc1_1%3a1.1.4-1build3_amd64.deb ...
Unpacking libfontenc1:amd64 (1:1.1.4-1build3) ...
Selecting previously unselected package libptexenc1:amd64.
Preparing to unpack .../20-libptexenc1_2021.20210626.59705-1ubuntu0.2_amd64.deb ...
Unpacking libptexenc1:amd64 (2021.20210626.59705-1ubuntu0.2) ...
Selecting previously unselected package rubygems-integration.
Preparing to unpack .../21-rubygems-integration_1.18_all.deb ...
Unpacking rubygems-integration (1.18) ...
Selecting previously unselected package ruby3.0.
Preparing to unpack .../22-ruby3.0_3.0.2-7ubuntu2.7_amd64.deb ...
Unpacking ruby3.0 (3.0.2-7ubuntu2.7) ...
Selecting previously unselected package ruby-rubygems.
Preparing to unpack .../23-ruby-rubygems_3.3.5-2_all.deb ...
Unpacking ruby-rubygems (3.3.5-2) ...
Selecting previously unselected package ruby.
Preparing to unpack .../24-ruby_1%3a3.0~exp1_amd64.deb ...
Unpacking ruby (1:3.0~exp1) ...
Selecting previously unselected package rake.
Preparing to unpack .../25-rake_13.0.6-2_all.deb ...
Unpacking rake (13.0.6-2) ...
Selecting previously unselected package ruby-net-telnet.
Preparing to unpack .../26-ruby-net-telnet_0.1.1-2_all.deb ...
Unpacking ruby-net-telnet (0.1.1-2) ...
Selecting previously unselected package ruby-webrick.
Preparing to unpack .../27-ruby-webrick_1.7.0-3_all.deb ...
Unpacking ruby-webrick (1.7.0-3) ...
Selecting previously unselected package ruby-xmlrpc.
Preparing to unpack .../28-ruby-xmlrpc_0.3.2-1ubuntu0.1_all.deb ...
Unpacking ruby-xmlrpc (0.3.2-1ubuntu0.1) ...
Selecting previously unselected package libruby3.0:amd64.
Preparing to unpack .../29-libruby3.0_3.0.2-7ubuntu2.7_amd64.deb ...
Unpacking libruby3.0:amd64 (3.0.2-7ubuntu2.7) ...
Selecting previously unselected package libsynctex2:amd64.
Preparing to unpack .../30-libsynctex2_2021.20210626.59705-1ubuntu0.2_amd64.deb ...
Unpacking libsynctex2:amd64 (2021.20210626.59705-1ubuntu0.2) ...
Selecting previously unselected package libteckit0:amd64.
Preparing to unpack .../31-libteckit0_2.5.11+ds1-1_amd64.deb ...
Unpacking libteckit0:amd64 (2.5.11+ds1-1) ...
Selecting previously unselected package libtexlua53:amd64.
Preparing to unpack .../32-libtexlua53_2021.20210626.59705-1ubuntu0.2_amd64.deb ...
Unpacking libtexlua53:amd64 (2021.20210626.59705-1ubuntu0.2) ...
Selecting previously unselected package libtexluajit2:amd64.
Preparing to unpack .../33-libtexluajit2_2021.20210626.59705-1ubuntu0.2_amd64.deb ...
Unpacking libtexluajit2:amd64 (2021.20210626.59705-1ubuntu0.2) ...
Selecting previously unselected package libzzip-0-13:amd64.
Preparing to unpack .../34-libzzip-0-13_0.13.72+dfsg.1-1.1_amd64.deb ...
Unpacking libzzip-0-13:amd64 (0.13.72+dfsg.1-1.1) ...
Selecting previously unselected package xfonts-encodings.
Preparing to unpack .../35-xfonts-encodings_1%3a1.0.5-0ubuntu2_all.deb ...
Unpacking xfonts-encodings (1:1.0.5-0ubuntu2) ...
Selecting previously unselected package xfonts-utils.
Preparing to unpack .../36-xfonts-utils_1%3a7.7+6build2_amd64.deb ...
Unpacking xfonts-utils (1:7.7+6build2) ...
Selecting previously unselected package lmodern.
Preparing to unpack .../37-lmodern_2.004.5-6.1_all.deb ...
Unpacking lmodern (2.004.5-6.1) ...
Selecting previously unselected package preview-latex-style.
Preparing to unpack .../38-preview-latex-style_12.2-1ubuntu1_all.deb ...
Unpacking preview-latex-style (12.2-1ubuntu1) ...
Selecting previously unselected package t1utils.
Preparing to unpack .../39-t1utils_1.41-4build2_amd64.deb ...
Unpacking t1utils (1.41-4build2) ...
Selecting previously unselected package teckit.
Preparing to unpack .../40-teckit_2.5.11+ds1-1_amd64.deb ...
Unpacking teckit (2.5.11+ds1-1) ...
Selecting previously unselected package tex-gyre.
Preparing to unpack .../41-tex-gyre_20180621-3.1_all.deb ...
Unpacking tex-gyre (20180621-3.1) ...
Selecting previously unselected package texlive-binaries.
Preparing to unpack .../42-texlive-binaries_2021.20210626.59705-1ubuntu0.2_amd64.deb ...
Unpacking texlive-binaries (2021.20210626.59705-1ubuntu0.2) ...
Selecting previously unselected package texlive-base.
Preparing to unpack .../43-texlive-base_2021.20220204-1_all.deb ...
Unpacking texlive-base (2021.20220204-1) ...
Selecting previously unselected package texlive-fonts-recommended.
Preparing to unpack .../44-texlive-fonts-recommended_2021.20220204-1_all.deb ...
Unpacking texlive-fonts-recommended (2021.20220204-1) ...
Selecting previously unselected package texlive-latex-base.
Preparing to unpack .../45-texlive-latex-base_2021.20220204-1_all.deb ...
Unpacking texlive-latex-base (2021.20220204-1) ...
Selecting previously unselected package libfontbox-java.
Preparing to unpack .../46-libfontbox-java_1%3a1.8.16-2_all.deb ...
Unpacking libfontbox-java (1:1.8.16-2) ...
Selecting previously unselected package libpdfbox-java.
Preparing to unpack .../47-libpdfbox-java_1%3a1.8.16-2_all.deb ...
Unpacking libpdfbox-java (1:1.8.16-2) ...
Selecting previously unselected package texlive-latex-recommended.
Preparing to unpack .../48-texlive-latex-recommended_2021.20220204-1_all.deb ...
Unpacking texlive-latex-recommended (2021.20220204-1) ...
Selecting previously unselected package texlive-pictures.
Preparing to unpack .../49-texlive-pictures_2021.20220204-1_all.deb ...
Unpacking texlive-pictures (2021.20220204-1) ...
Selecting previously unselected package texlive-latex-extra.
Preparing to unpack .../50-texlive-latex-extra_2021.20220204-1_all.deb ...
Unpacking texlive-latex-extra (2021.20220204-1) ...
Selecting previously unselected package texlive-plain-generic.
Preparing to unpack .../51-texlive-plain-generic_2021.20220204-1_all.deb ...
Unpacking texlive-plain-generic (2021.20220204-1) ...
Selecting previously unselected package tipa.
Preparing to unpack .../52-tipa_2%3a1.3-21_all.deb ...
Unpacking tipa (2:1.3-21) ...
Selecting previously unselected package texlive-xetex.
Preparing to unpack .../53-texlive-xetex_2021.20220204-1_all.deb ...
Unpacking texlive-xetex (2021.20220204-1) ...
Setting up fonts-lato (2.0-2.1) ...
Setting up fonts-noto-mono (20201225-1build1) ...
Setting up libwoff1:amd64 (1.0.2-1build4) ...
Setting up libtexlua53:amd64 (2021.20210626.59705-1ubuntu0.2) ...
Setting up libijs-0.35:amd64 (0.35-15build2) ...
Setting up libtexluajit2:amd64 (2021.20210626.59705-1ubuntu0.2) ...
Setting up libfontbox-java (1:1.8.16-2) ...
Setting up rubygems-integration (1.18) ...
Setting up libzzip-0-13:amd64 (0.13.72+dfsg.1-1.1) ...
Setting up fonts-urw-base35 (20200910-1) ...
Setting up poppler-data (0.4.11-1) ...
Setting up tex-common (6.17) ...
update-language: texlive-base not installed and configured, doing nothing!
Setting up libfontenc1:amd64 (1:1.1.4-1build3) ...
Setting up libjbig2dec0:amd64 (0.19-3build2) ...
Setting up libteckit0:amd64 (2.5.11+ds1-1) ...
Setting up libapache-pom-java (18-1) ...
Setting up ruby-net-telnet (0.1.1-2) ...
Setting up xfonts-encodings (1:1.0.5-0ubuntu2) ...
Setting up t1utils (1.41-4build2) ...
Setting up libidn12:amd64 (1.38-4ubuntu1) ...
Setting up fonts-texgyre (20180621-3.1) ...
Setting up libkpathsea6:amd64 (2021.20210626.59705-1ubuntu0.2) ...
Setting up ruby-webrick (1.7.0-3) ...
Setting up fonts-lmodern (2.004.5-6.1) ...
Setting up fonts-droid-fallback (1:6.0.1r16-1.1build1) ...
Setting up ruby-xmlrpc (0.3.2-1ubuntu0.1) ...
Setting up libsynctex2:amd64 (2021.20210626.59705-1ubuntu0.2) ...
Setting up libgs9-common (9.55.0~dfsg1-0ubuntu5.9) ...
Setting up teckit (2.5.11+ds1-1) ...
Setting up libpdfbox-java (1:1.8.16-2) ...
Setting up libgs9:amd64 (9.55.0~dfsg1-0ubuntu5.9) ...
Setting up preview-latex-style (12.2-1ubuntu1) ...
Setting up libcommons-parent-java (43-1) ...
Setting up dvisvgm (2.13.1-1) ...
Setting up libcommons-logging-java (1.2-2) ...
Setting up xfonts-utils (1:7.7+6build2) ...
Setting up libptexenc1:amd64 (2021.20210626.59705-1ubuntu0.2) ...
Setting up texlive-binaries (2021.20210626.59705-1ubuntu0.2) ...
update-alternatives: using /usr/bin/xdvi-xaw to provide /usr/bin/xdvi.bin (xdvi.bin) in auto mode
update-alternatives: using /usr/bin/bibtex.original to provide /usr/bin/bibtex (bibtex) in auto mode
Setting up lmodern (2.004.5-6.1) ...
Setting up texlive-base (2021.20220204-1) ...
/usr/bin/ucfr
/usr/bin/ucfr
/usr/bin/ucfr
/usr/bin/ucfr
mktexlsr: Updating /var/lib/texmf/ls-R-TEXLIVEDIST... 
mktexlsr: Updating /var/lib/texmf/ls-R-TEXMFMAIN... 
mktexlsr: Updating /var/lib/texmf/ls-R... 
mktexlsr: Done.
tl-paper: setting paper size for dvips to a4: /var/lib/texmf/dvips/config/config-paper.ps
tl-paper: setting paper size for dvipdfmx to a4: /var/lib/texmf/dvipdfmx/dvipdfmx-paper.cfg
tl-paper: setting paper size for xdvi to a4: /var/lib/texmf/xdvi/XDvi-paper
tl-paper: setting paper size for pdftex to a4: /var/lib/texmf/tex/generic/tex-ini-files/pdftexconfig.tex
Setting up tex-gyre (20180621-3.1) ...
Setting up texlive-plain-generic (2021.20220204-1) ...
Setting up texlive-latex-base (2021.20220204-1) ...
Setting up texlive-latex-recommended (2021.20220204-1) ...
Setting up texlive-pictures (2021.20220204-1) ...
Setting up texlive-fonts-recommended (2021.20220204-1) ...
Setting up tipa (2:1.3-21) ...
Setting up texlive-latex-extra (2021.20220204-1) ...
Setting up texlive-xetex (2021.20220204-1) ...
Setting up rake (13.0.6-2) ...
Setting up libruby3.0:amd64 (3.0.2-7ubuntu2.7) ...
Setting up ruby3.0 (3.0.2-7ubuntu2.7) ...
Setting up ruby (1:3.0~exp1) ...
Setting up ruby-rubygems (3.3.5-2) ...
Processing triggers for man-db (2.10.2-1) ...
Processing triggers for fontconfig (2.13.1-4.2ubuntu5) ...
Processing triggers for libc-bin (2.35-0ubuntu3.4) ...
/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_5.so.3 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libur_loader.so.0 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbb.so.12 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_0.so.3 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc.so.2 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc_proxy.so.2 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libur_adapter_opencl.so.0 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbbind.so.3 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libur_adapter_level_zero.so.0 is not a symbolic link

Processing triggers for tex-common (6.17) ...
Running updmap-sys. This may take some time... done.
Running mktexlsr /var/lib/texmf ... done.
Building format(s) --all.
	This may take some time... done.
In [ ]:
!jupyter nbconvert --to pdf '/content/drive/My Drive/Assignment1/Assignment_1_Ndumnwere_Ezinne.ipynb'
[NbConvertApp] Converting notebook /content/drive/My Drive/Assignment1/Assignment_1_Ndumnwere_Ezinne.ipynb to pdf
[NbConvertApp] Support files will be in Assignment_1_Ndumnwere_Ezinne_files/
[NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files
[NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files
[NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files
[NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files
[NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files
[NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files
[NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files
[NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files
[NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files
[NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files
[NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files
[NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files
[NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files
[NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files
[NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files
[NbConvertApp] Making directory ./Assignment_1_Ndumnwere_Ezinne_files
[NbConvertApp] Writing 87292 bytes to notebook.tex
[NbConvertApp] Building PDF
[NbConvertApp] Running xelatex 3 times: ['xelatex', 'notebook.tex', '-quiet']
[NbConvertApp] Running bibtex 1 time: ['bibtex', 'notebook']
[NbConvertApp] WARNING | bibtex had problems, most likely because there were no citations
[NbConvertApp] PDF successfully created
[NbConvertApp] Writing 552669 bytes to /content/drive/My Drive/Assignment1/Assignment_1_Ndumnwere_Ezinne.pdf